DOCUMENT RESUME 



ED 075 492 



TM 002 571 



AUTHOR 
TITLE 

PUB CATE 
NOTE 



EDRS PRICE 
DESCRIPTORS 



ABSTRACT 



Jongsma, Eugene A, 

Viewing Standardized Social Studies Achievement Tests 
from a Reading Perspective. 
Nov 7 2 

17p, ; Paper presented at Annual Meeting of Mid-South 
Educational Research Association (New Orleans, 
Louisiana, November 1 97 2) 

MF-$0.6S HC-$3,29 

♦Achievement Tests; Psychometrics ; ^Reading Skills; 
♦Social Studies; ^Standardized Tests; Test 
Interpretation; *Test Validity 



A critical examination is made of standardized social 
studies achievement tests from a psychometric and reading 
perspective. Five major issues are identified that detract from the 
meaningful interpretation of student performance on standardized 
social studies tests. The issues discussed are (1) the reading 
dependency of social studies items, (2) the picture-dependency of 
social studies items, (3) the cognitive skills assessed by social 
studies items, (U) the lack of an adequate system of item 
development, and (5) the lack of content validity. Each of the issues 
is defined and related to test validity and interpretation* The 
central theme of the study is the inadequacy of the content validity 
of most standardized social studies tests. Unless test publishers 
specify more explicitly the elements of content and types of behavior 
sampled on their tests, test users will have great difficulty in 
making meaningful interpretations of student performance. 
(Author/DB) 



FILMED FROM BEST AVAILABLE COPY 



U.S. DEPARTMENT OF HEALTH. 
EDUCATION & WELFARE 
OFFICE OF EOUCATION 

THIS OOCUMENT HAS BEEN REPRO- 
OiJCED EXACTLY AS RECEIVED FROM 
' THE PERSON OR ORGANIZATION ORIG- 

^ INATING IT POINTS OF VIEW Oft OPIN- 

.«rf" IONS STATEO DO NOT NECESSARILY 

/~Z REPRESENT OFFICIAL OFFICE OF EDU- 

Mi CATtON POSITION OR POLICY 



VIEWING STANDARDIZED SOCIAL STUDIES ACHIEVEMENT TESTS 
FROM A READING PERSPECTIVE 



rH 

10) 



Eugene A. Jongcma 
Louis icma Stc^te University in Nev; Orleans 



0-y 
o 

o P^per presented the Annurol Meriting of the Mid-South 

Edv^cational Research Assooitition 
Ner-j Orleans, Novembor 1972 



ERIC 



vir'T. - si.u;d.\rdized social studies \chieveI:E:t :.t.sts 
7i;c:i A T^:j)i:':i rjLi^srrcTivE 

Eugene A. Jongsma 
Louisiana State University in Neu Orleans 

It comes as no surorise to say that a close relationship exists 
betX'jeen reading ability and achievement in the social studies. Virtually 
all standardized social studies tests involve some reading. There are 
some exceptions to this such as Preston and Duffoy's Prirnary Social Studios 
Test (1967) xijhich uses pictures but, by and large, reading is required 
on nearly all standardized social studies tests. 

How important is this relationship? To T-7hat extent is performance 
on a standardized social studies test influenced by reading ability and 
general test-taking skillt;? We (Gaines and Jongsma, 1972) receniily 
conducted a study in xjhich we attempted to raise student performance on 
a standardized achievement test by teaching the students a few basic 
reading and test-takinp; skills. The students T:ere lower socioeconomic 
fifth graders who vfe assumed v?ere not ^'test^vjise'^ We developed an 
instructional package called "Test-Taking Tips" which consisted of 
illustrations and exercises for the students to work. The unit covered 
five major topics — (1) motivation, (2) following dii-ections, (3) guessing, 
(4) reading comprehension, and (5) test behavior. A random selection of 
students vjorked through the unit in approximately one hour the day before 
the standardized test \-7as administered. Results showed that students 
vjho worked through the unit made significantly higher scores on several 
of the subtests than their counterparts X7ho had not seen the unit. The 



-2- 



social studies secCicn r^as one of the subtests on which significant 
differences were found. That is, v;e improved the students* social stu-- 
dies achievement, not by teaching them anything abouc social studies, 
but by alerting them to a few reading and tesc-taking skills. Another 
interesting sidelight to this study was uhe unusually high correlation 
of .83 which vje found bstwaen total reading scorj and the social siiudies 
subtest. However, similarly high correlations becveer readiug achieve- 
ment and social studies achievement hav^ been reported by other 
researchers (Thomas, 1967; Wash, 1968; Gaines, 1971). This suggests 
the influence reading ability has on performance on social studies tests. 

The purpose of chis paper is to prer^ent a critical analysis of 
current standardized social studies achievement tests. The analysis 
v?ill be based on psychometric considerations and the influence of 
related reading skills. Five major issues x;ill be identified and 
discussed. 

1. Readinr<-Depende ncy of Social Stud ies Test Items 
One of the formats ;:hat: is cocnmonly found on standardized social 
studies tests is the procedure xchereby the student is given a passage 
to read follovjed by multiole-choice comprehension questions. This for* 
mat looks no different than conventional reading comprehension tests 
except for the social studies content. The assumption is that the mul- 
tiple-'choice items are based directly on the passage and that the 
student must comprehend the passage in order to correctly answer the 
items. That is, if the iteins are reading-dependent, students will not 



-3- 



be able to obtain betLer than a chance scor: without having read th^ 
prerequisite passage. Thus, reading-dependency refers to the rela- 
tionship betTjeen multiple-choice test items and the passage on which 
they are based. 

Several studies have shovjn that such is not tie case, Preston 
(1964) found that the maan score of a group of students not reading 
the passage was significantly greater than chance on the Reading Com- 
prehension section of uho Cooperative English Test . Weaver and Bickley 
(1967) randomly selected items from several standardized reading compre- 
hension tests listed in Euros' Sixth Hentgl Lieasurement Yearbook and 
administeriid them to college students. Subjects ans\^ering without the 
passages answered 67 percent as many it'jins as the subjects vho had 
access to the passages. Mitchell (1967) obtained similar results \-?ith 
fourth graders using the Gates Basic Reading Test . Tuinman (1970) found 
a lack of reading-dependency for iteras on the STEP Reading Test . VJhile 
all these studies have ujred reading comprehension tests, I suspect i-ie 
wou Id fiW similar results V7ith standardised social studies tests that 
use the same format. 

Perhaps the follovjing sample items T7ill help to illustrate the 
point: 



-4- 



Passgge* 

Comparison of transatlantic trav<il 
^between Pilgrims and travelers of 1955 



25. Uhich is the best reason vhy the 1955 travelers were 
mors comfortable than the Pilgrims? 

A. The Pilgrims were poor. 

B. The 1955 travelers xjere more intelligent. 

C. The ocean was less stormy. 

D. Between 1620 and 1955 many inventions had 

been made, 

26, What kind of power x,as used to move the Mayflox^er? 

A# Wind bloving on sails 

B. Many oars pulled by slaves 

C. Steam engines using coal 

D. Gasoline engines 



Passafie ^ 
Description of a family camping 
and cooking out 



15. Which is the best reason why John's family should observe 
fire laws in forests? »'=*.ve 

E. Those who disobey are punished. 

F. Animals are frightened by campfires. 

G. Fires can cause great damage In forests. 

H. Few people know how to build a safe fire. 

17. Who makes the laws about fires in forests? 

Ak The forest rangers 

B. The government 

C. The people who sell the timber 

D. The men who cut the timber 



^Sequential Tesc of Educati onal Pro^ re..... Social Studies, 
Form ^A, Educational Testing Service, 1956 

Each of these Items was preceded by a passage. Students were to 

answer the items based upon informcfc ion gained from the passage. Hovcvor. 

^ as one can pee many students, would bo able to answer such items 



-5- 

without reading the relevant passage at all. In short, these items 
do not appear to be readirig-dependent. 

Some may ask if it's indeed necessary for social studies items to 
be reading-dependenu. Th^ answer to that quescion is a matter of test 
validity and interpretation. If the purpose In testing is to assess 
X7hether the student has achieved some predeterddiied level of compre- 
hension of social studies content, then reading-dependency is not impor- 
tant. In this cas::, the sources of information from vjhich the stud;.nt 
draws when ansv7ering are not important. On the other hand, if the 
purpose is to determine Hot? nuich information a student is able to gain 
from reading social scudies material, ihe items must be reading depen- 
dent. It is doubtful if test publishers havG come to grips with this 
Issue. 

Readings-dependency is a relative mattor chat is related to the 
knowledge and experiential background of th?. student. For one student 
an item may be reading-dependent while fo:: a more sophisticated student, 
the same item is not reading-dependent. For practical reasons judgments 
regarding the degree of reading-dependency cannot be made in terms of 
individuals but have to be assessed on a group basis. 

2. Picture-Dependency of Social Studies Test Items 
Much of what T7as said in the previous section could also apply to 
the use of pictures on standardized social studies tests. On some 
tests students are asked questions which are supposedly based upon pic- 
tures included in the tect. Careful examination of such questions, 
however, reveals that many of them could probably be answered without 

EKLC 



even referring to the picture. The following examples r.7ete taken from 
standardized social studies tests thai: are currently on the market: 



* Picture of the produce section 

of a grocery store in Qhio^ 

1. Uhich food was raised on a farm somewhere in our country? 

A. Tea 

5* Apples 

C. Cocoa 

D* Coffee 



-^• What food most likely traveled part of the way to the 
store by boat? 

E. Oranges 

peas 
G. Bananas 
II. ' Celery 



5. Which food roost likely traveled farthest to reach 
the store? 

A. Pineapples 

B. Lettuce 

C. Eggs 

D* peaches 



Series of pictures which tell the 
story of bread-making* 

20* Which picture should come first? 

The grain elevator 
F. the flour mill 

The bakery 
H. The shocks of wheat 



21, Which picture should come next after the grain elevator? 

A. The bread on a store shelf 
B* The flour mill 
C. The bakery 

The slice of bread being buttered 



*SeQUential Tests of Educational Progress > Social Studies, 
Form 4A, Educational Testing Service, 1956 



-7- 

Most students would be able Co ansrer these Items '--ithout referring 
to the pictures provided in the test. What purpose do the pictures serve 
in this case? iiy only conclusion is that they may serve as an aid to 
students V7eaic in word identification skills. For example, the student 
who is unatle to identify the word "pineapples" may refer to the 
pictures, recognize the visual referent of "pineapples", and proceed 
to ansvjcr the item. I seriously question if this was i:he purpose the 
publishers intended the pictures to serve. 

Once again the issue is one of test validity. If the intent is 
to assess the student's ability to recognize certain social studies 
concepts in pictures then the items must be picture -dependent. Per- 
haps what is needed is some rethinking of the role pictures should 
play on social studies test. 

3, Co<;nltive Ski lls A ssessed by Social Studies Items 
If one critically examines the items found on standardized Svocial 
studies tests, he should find that in some cases the items strongly 
resemble the kinds of items often found on group intelligence tests. 
For example, many standardized social studies tests include an assess- 
ment of the student's aocial studies vocabulary. Although this xrould 
appear appropriate, some of the vocabulary sections of social studies 



tests seem to extend ir_o other cognitive areas as well. Perhaps the 
following examples taken from the Metropolican Achieveaent Test T;ill 
illustrate this peine. The directions co the students are: "Read each 
se- of headings anc] the list of items follot^ing each set. Each item is 
most -losely associated rlth, or fits besc under, one of the headings. 
Decide v:hich heading is best for each item.*' 

Selecued Items from the Incermediaue Level -^ 

HeadipRs items 

A. Commerce and Trade 27. minority rights 

B. Communication 23. profit 

C. Government 35. A-bomb 

D. Inventions 40. poTjer loom 

Selected Items from the Advanced Level' ^ 

Headiny>s Items 

A. Authors and Journalists 2I. Babe Ruth 

B. Educational, Religious and 24. Eli Whitney 

Social Reformers 

C. Scientists and Inventors 25. Edgar A. Pee 

D. Leaders in Hater tainment, Sports, 32. 'nit Chamberlain 

and Theater 



* Iletropoi itan Achievement Test > Social Studies, Intermediate 
Level, Forni F, Harcourt Brace and Jovanovich, 1970 

^ ^Metropolitan Achievement Test , Social Studies, Advanced 
Level, Form F> Karcourt Brace and Jovanovich, 1970 

Critical examination of these items reveals several important 
points. First, ic appears that these items require the student to 
employ some sort of classification skill over and above the knouledge 
of the vocabulary used. Second, the vocabulary items are presented 
in isolation vihich is artificial and not reflective of the contexual 
settings used in instruction. This is especially misleading for 
vocabulary terms or phrases \^hich may have a wide range of connotative 



and denotative meanings;. For exanple, according to the test publishers 
the correct heading for ^'A-botnb" is "Invcintions/' Yet the student vjho 
assigns "A-bomb" to the heading "Government" may have a far more sophis- 
ticated understanding of chat concepc than his test results Kould indicat 

Or 5 other poini: seems V7orth mentioning. On the surface, it 
appears that the same test task is required for both the Intermediate 
and Advanced levels above, that is, the asj:i£:nins of vocabulary items 
to general headings. Houever, nhe difference* in vocabulary items 
changes the skill required by the student considerably. Classifying 
famous people vould se*=m to require less cognitive skill than classifying 
subjective terms such as ''minority rights." 

Test publishers have not clearly specified the cognitive skills 
assessed by standardised social studies tests. In many cases a 
variety of skills are lumped under an over-sinplif ied heading such as 
"knoxjledge of social studies vocabulary." Test users must be cautious 
about judging a test by its name only. 

4 . Lack of siT iMe^uat^._S^teTaJE^^ Social Studies Test Items 

Social studies test developers are suffereing from a malady that has 
plagued reading test developers for a number of years. That is, lack of 
an adequate scheme or conceptual model for developing test items. To be 
sure, test publishers have become very sophisticated at data analysis 
3 -ter i:he items are constructed. Standardization procedures, item ana- 
lyses, reliability and validity estimates, are conducted with efficiency 
and technical skill. Yet the actual development of test items is largely 
done on a logical and intuitive basis. The point is, our technical 



expertise of whai: to do t'ich items aftex* ^hay^re developed far exceeds 
our fcnov'ledge ol hor; to rysteraacically consuvuct items. 

Consider for a moment the level of ihinking regarding social 
studies test d^^velopm^nc. Host social stuc'ies tests are defined by 
the concent chsy sample. References are uiade about the inclusion of 
American history, geogi-aphy, sociology, or sotne other sub area of the 
social studies domain. Occasionally the tests, or portions of them, 
are defined in terms of studenc behavioi: such as the ability to read 
maps or Interpret graphs. 

One still :vees recoinmendations in the literature to construct a 
topic-by^process matrix vAien designing e. test. In such a matrix 
behaviors are crossed t^lth elements of cor. tent. The behaviors are 
usually defined according to Bloora's Taxonomy (1956). This paradigm 
has been around for years and is still resorted to in a good many 
cases. The Taxonoiny developed by Bloom aud his associates sixteen 
years ago uas a step in the right direction but it may have outlived 
its usefulness. As Bormuth (1970), Anderson (1972), Sullivan (1969), 
and tnany other critics have convincingly argued, the categories v?ithin 
the Taxonomy overlap and do not lead readily to operational definitions 
Anyone vho has tried to develop test items based on the Taxonomy can 
attest to the ambiguity involved. As Anderson (1972, p. 149) points 
out , . ,v»hat is required is a system of e:q5llcit definitions ana rules 
to derive test items from instructional statements such that a person 
can ansvor the items correctly if, and only if, he comprehends the 
statements.'* 



-11- 

Do such .; Yes, there arc such systems on the hori- 

zon, but at the presen-: time they are exploratory and have not yet 
proven themselves. John Bormuth (1970), in his L>ook, On the Theory 
of Achievement Test Iterns ^ presents a linguistic rationale for deriving 
test items from instructional statements. Essentially it involves 
making gramtiiatical transf o-rmations to form different classes of test 
items. 

Schlesinger and VJeiser (1970) of the Israel Institute of Applied 
Social Research, have proposed a facet design for the systematic con- 
struction of items for a reading comprehension test. Their facet 
design,'*. . .concentrates on the relationship betx^een the test item and 
the text on vjhich it is based, rather than on the skills and abilities 
presumably involved in ansvTcring the item." (p. 563) The classifica- 
tory scheme of this model would simultaneously include the correct 
ansver as x.)ell as the incorrect dis tractors. 

Another model that has received some attention i.^ that of "domains- 
referenced achievement testing" proposed by Hively, et. al. (1963^. 
In this approach, rules arc specified to generate a universe or domain 
of every possible test item of interest In a field of knowledge. A 
test is formed by sampling from che universe in a partly random 
fashion. Hively and colleagues (1963) have v7orked out a system of 
rules for generating a universe of items co cover elementary mathe- 
matics, While elementary mathematics may De a relatively easy field 
in which to apply such a model, perhaps efforts should be made in other 
domains or knovjledge, such as social studies. 



-12- 



As one can see, empirical item seleciiion procedures are emerging. 
One of the conmion factors found in many of these new models is the 
relationship betueen the rjording of the test item and the wording of 
instruction. Social studies test developers as vjell as social studies 
practitioners will need to become more linguistically-oriented in 
the future if they a:.e to understand and apply rational item selection 
procedures* 

^, Lack of Content Validity 

This last major criticism could perhaps be considered a summary 
of all the criticisms madfj previously. Standardized social studies 
tests, for the most part, lack a clear and operational definition of 
content validity. It is not clear to test users vjhat such tests are 
actually measuring. Teachers and principals use the test results to 
make statements about their students' levels of "social studies achieve- 
ment" with only a vague and ambiguous understanding of that concept. 

The American Psychological Association's Standards for Educational 
and Psychological Tests and M anuals (1966) makes the following recommen- 
dation regarding content validity: 

"If a test performance is to be interpreted as a 
sample of performance or a definition of perfor- 
mance in some universe of situations, the manual 
should indicate clearly V7hat universe is repre- 
sented and how adequate is the sanpling." 

The concept of "social, studies achicveuent" could be defined along 

two dimensions-- the universe of content and the universe of behaviors. 

How adequately have publishers sampled from these two dimensions? 



-13- 



Content validity is frequently desciribed i 1 studies test 

manuals a- a process of carefully surveying and sampling textbooks j 
cour js o. udy, or curriculum guides to obtain a representative 
segment of social studies content. Hovjever, rarely are the surveyed 
materials identified. Also, the copyright dates of the materials are 
almost never given. Critical reviev7s of social studies tests found 
in Euros (1971) suggest that the sampling has been less than adequate. 

In terms of the universe of content, criticism has often been mad 
of the overemphasis on history at the ex »ense of tiiultidisciplinary . 
fields such as anthropology and sociology. Critics of the sampling 
from the universe of behaviors have pointed to emphasis on recall of 
factual information to the exclusion of such br:haviors as critical 
reading, analysis, and application. vJher.i test publishers have tried 
to define their rest by both content and behaviors, the relationship 
betv?een the tv?o diuiensions has been vague and not clearly spelled out* 

Until test publishers adequately define the content of social 
studies tests, test users vjill have difficulty making meaningful inter 
pre tat ions of student performance. 



Summary 

The purpose of this paper v^as to cricicaLly examine standardized 
soc"*'^"^ studies achievemeiil: tests from a psychoiTiecric and reading per- 
Gpet^^ive. Five major issues were identif:-.en chat detract from the 
meaningful interpretation of student perfomance on standardized 
social studies tests. The issues discussed uere (1) the readings- 
dependency of social Suudies items, (2) the picture-dependency of 
social studies items, (3) the cognitive slrills assessed by social 
studies items, (^:) the lack of an adequate system of item development, 
and (5) the lack of content validity. Each of the issues were defined 
and related to te^>t validity and interpretation. The central theme 
running throughout the paper V7as the inadequacy of the content validity 
of most standardised social studies teste. Unless test publishers 
specify more explicitly the elements of conten*: and types of behavior 
sampled on their tests, test users vill h^ve great difficulty making 
meaningful interpretations of student performance. 



References 



American Psychological Association, Standards for Educational and 

Psychological Tests and Manuals , Washington, D. C. : American 
Psychological Association, 1966 

Anderson, R, C. "Hov? to Construct Achievement Tests to Assess Compre- 
hension,*' Reviev7 of Educational Research , 42(1972) 145-170 

Bloom, B. S. (ed.) Taxonomy of Educational Objectives ; Cognitive 
Domain , Nev7 York: David lOcKay Inc., 1956 

Bormuth, J. R. On the Theory of Achievement Test Items , Chicago: 
University of Chicago Press, 1970 

Euros, 0. K. The Seventh I-iental Measurements Yearbook , volumes 1 and 
2, Highland Park, New Jersey: Gryphon Press, 1972 

Gaines, W, G. An Application of John B. Carroll's Model of School 

Learning to the Teaching of Anthropology, unpublished doctoral 
dissertation, University of Georgia, 1971 

Gaines, IJ. G. and Jongsma, E. A, "The Effects of Instruction in Test- 
Taking Skills on Standardised Achievement Test Scores of Fifth 
Grade Pu[;lls", in press, 1972 

Hively, , Patterson, H. L. and Page, S. H. Generalization of 

Performance by Job Corp Trainees on a Universe^Def ined System 
of Achievement Tests in Elementary Mathematical Calculation , 
Saint Paul: 14innesota National Laboratory, 1968 

Hively, W. "Domain-Referenced Achievement Testins", Symposium 

prepresented at the American Educational Research Association 
Convention, February, 1970 

Mitchell, R. W. '^A Comparison of Children's Responses to an Original 
and Experimental Form of Subtests GS and ND of the Gates Basic 
Reading Test", unpublished doctoral dissertation. University of 
Minnesota, 1967 

Preston, R. C. "Ability of Students to Identify Correct Responses 
before Reading", Jou?:nal of Educational Research , 58 (1964), 
131-133 

Preston, R. C. and Duffey, R. V. Primary Social Studies Test , Boston: 
Houghton Mifflin, 1967 



Schlesinger , I. ii. and V:^eiser, Z. "A Facet Design for Tests of 

Reading Comprehension*', Reading Research Quarterly , 5 (1970) 
566-530 

Sullivan, H. J. "Objectives, Evaluation, and Improved Learner 

Achievement*' in Instructional Objectives , P^. E. Stake (ed.)> 
Americaii Educational Research Association Monograph Series on 
Curriculum Evaluation, Chicago: Rand HcNally, 1969 

Thomas, G. The Use of Programmed Instruction for Teaching Anthro- 
pology in the Fifth Grade, unpublished doctoral dissertation, 
University of Georgia, 1967 

Tuinraan, J. '^Selected Aspects of the Assessment of the Acquisition 
of Information from Reading Passages", unpublished doctoral 
dissertation. University of Georgia, 1970 

Wash, J. A. An Evaluation of Pupil Performance in the Anthropology 
Curriculum Project, Grades 1, 2, 4, and 5, General Information 
Series No, 5, Anthropology Curriculum Project, University of 
Georgia, 1967 

Weaver, W. and Bickley, A. C. "Sources of Information for Responses to 
Reading Test Items," APA Proceedings, 75th Annual Convention, 
1967, 293-29^: 



